Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 0: Load The Data

In [1]:
# Load pickled data
import pickle

training_file = "traffic-signs-data/train.p"
testing_file = "traffic-signs-data/test.p"

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
X_train_input, y_train = train['features'], train['labels']
X_test_input, y_test = test['features'], test['labels']

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Complete the basic data summary below.

In [2]:
# Summarize dataset
n_train = len(X_train_input)
n_test = len(X_test_input)
image_shape = X_train_input[0].shape
n_classes = len(set(y_train))

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.

In [3]:
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.
import matplotlib.pyplot as plt
# Visualizations will be shown in the notebook.
%matplotlib inline

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

In [4]:
# Global variable for settings and tuning
DEBUG = 1
VALIDATION_RATIO = 0.2
In [5]:
# Analyze and preprocess data.

import cv2
import numpy as np

def display(img, label):
    plt.figure()
    plt.imshow(img)
    plt.suptitle("%s" % (label))
    plt.tight_layout()
    plt.axis('off')
    plt.show()

def display_many(images, labels):
    title_font = {'size':'40'}
    columns = 5
    rows = len(images)/columns
    plt.figure(figsize=(64, 64))
    for idx, img in enumerate(images):
        axis = plt.subplot(rows + 1, columns, idx+1)
        axis.xaxis.set_visible(False)
        axis.yaxis.set_visible(False)
        img = np.reshape(img, (image_shape[0], image_shape[1]))
        plt.imshow(img, cmap='gray')
        plt.title("[%d] %s" %(idx, labels[idx]), **title_font)
    plt.tight_layout()
    plt.show()
        
def get_reshaped_image(img):
    img = cv2.resize(img, (32, 32))
    return np.reshape(img, (32, 32, 1))

def grayscale(img):
    return cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

def normalize(arr):
    arr=arr.astype('float32')
    if arr.max() > 1.0:
        arr/=255.0
    return arr

def extract_bounding_box(img):
    #TODO: extract out data represented by the bounding box
    return img

def augment_data(image, label, count):
    #print("Augment count = ", str(count))
    images = []
    images.append(image)
    random.seed(SEED)
    for i in range(count):
        factor = random.randint(-1 * AUGMENT_ANGLE_RANGE, AUGMENT_ANGLE_RANGE)

        # Rotate
        M = cv2.getRotationMatrix2D((IMAGE_SIZE/2, IMAGE_SIZE/2), factor,1)
        dst = cv2.warpAffine(image,M,(IMAGE_SIZE,IMAGE_SIZE))

        factor1 = random.randint(-1 * AUGMENT_TRANSLATE_RANGE, AUGMENT_TRANSLATE_RANGE)
        factor2 = random.randint(-1 * AUGMENT_TRANSLATE_RANGE, AUGMENT_TRANSLATE_RANGE)
        # Shift
        M = np.float32([[1,0,factor1],[0,1,factor2]])
        dst = cv2.warpAffine(dst,M,(IMAGE_SIZE,IMAGE_SIZE))

        images.append(dst)
    return images

def analyse_data(labels):
    counts = []
    labels = list(labels)
    
    for idx, label in enumerate(set(labels)):
        counts.append(labels.count(label))
    
    # Plot a graph to analyse data and decide how to augment
    fig = plt.figure()
    ax1 = fig.add_axes([0,0,1., 1.])
    ax1.bar(range(0, n_classes), counts, 1)
    ax1.set_title('Number of images per class')
    plt.show()
  
    max_class_images = max(counts)
    augment_counts = [int(max_class_images/count - 1) for count in counts]
    #print(set(labels), counts, diff_counts)
    return augment_counts

print("Augmentation count per image: " + str(analyse_data(y_train)))

def preprocess_list(X_list):
    X_new = []
    for i, X in enumerate(X_list):
        X = grayscale(X)
        X = normalize(X)
        X = extract_bounding_box(X)
        X = get_reshaped_image(X)
        X_new.append(X)
    return X_new

X_train = preprocess_list(X_train_input)
X_test = preprocess_list(X_test_input)
Augmentation count per image: [9, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 1, 0, 9, 5, 5, 4, 3, 7, 0, 2, 8, 3, 7, 4, 1, 8, 2, 4, 0, 4, 9, 0, 6, 5, 8, 8]
In [6]:
# Plot Reference labels and corresponding images

# Load label data from csv file
import csv
with open('signnames.csv', 'r') as csvfile:
    labelreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    
    sample_labels = []
    for idx, row in enumerate(labelreader):
        if(idx == 0):
            continue
        sample_labels.append(row[1])
        print (idx, row[1])

# Gather one image per class
sample_images = []
y_list = list(y_train)
for idx, label in enumerate(set(y_list)):
    id_label = y_list.index(label)
    sample_images.append(X_train[id_label])

if DEBUG:
    display_many(sample_images, sample_labels)
1 Speed limit (20km/h)
2 Speed limit (30km/h)
3 Speed limit (50km/h)
4 Speed limit (60km/h)
5 Speed limit (70km/h)
6 Speed limit (80km/h)
7 End of speed limit (80km/h)
8 Speed limit (100km/h)
9 Speed limit (120km/h)
10 No passing
11 No passing for vehicles over 3.5 metric tons
12 Right-of-way at the next intersection
13 Priority road
14 Yield
15 Stop
16 No vehicles
17 Vehicles over 3.5 metric tons prohibited
18 No entry
19 General caution
20 Dangerous curve to the left
21 Dangerous curve to the right
22 Double curve
23 Bumpy road
24 Slippery road
25 Road narrows on the right
26 Road work
27 Traffic signals
28 Pedestrians
29 Children crossing
30 Bicycles crossing
31 Beware of ice/snow
32 Wild animals crossing
33 End of all speed and passing limits
34 Turn right ahead
35 Turn left ahead
36 Ahead only
37 Go straight or right
38 Go straight or left
39 Keep right
40 Keep left
41 Roundabout mandatory
42 End of no passing
43 End of no passing by vehicles over 3.5 metric tons

Question 1

Describe how you preprocessed the data. Why did you choose that technique?

Answer:

Please refer to the cells above for code

  1. Converted input data to grayscale
  2. Normalized input data to ensure the weights are evenly distrbuted
  3. Reshape input data to format accepted by network
  4. TODO: Use the bounding box to define the region and interest and cut out only those parts contained by the bounding box
  5. TODO: Analyzed the data to check if the input data is biased towards a particular label(i.e. more data for label x compared to average images per label)
    • If data is biased, add more synthetic data by rotating and translating input data

Question 2

Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?

Answer:

Current implementation uses LeNet network to classify input images into one out of the 43 classes

Training and validation set generation

  1. Preprocess data as mentioned above
  2. Use training data and split into training and validation set using a ratio (for current implementation 0.2)
  3. Define a batch size (current implementation 128) to operate on data depending on the memory available on GPU

Testing set generation

  1. Load data from pickle file

External test set generation

  1. Gather traffic signal images online and to evaluate performance of the network
  2. Crop traffic signs such that significant part of the image is traffic sign. In a real world scenario, object detection algorithm would detect these bounding boxes and feed them to the network
  3. Gather corresponding truth value and name the files with those truth values

Generating additional data

As noticed in the visualization above (Image caption 'Number of images per class') it can be seen that image data is not evenly distributed. Few labels have lesser samples available whereas some others have high amount of samples available.

Uneven distribution of data could lead to biased behavior of network and traffic signs with less number of signs are less likely to be detected correctly. To overcome this situation images for the classes with less number of samples can be augmented by generating synthetic data of augmented data. New images could be generating by rotation and shifting input data by small random amounts to generate variety of evenly distributed input data

In [7]:
# Split data into training/validation sets. Testing data is already available in pickle data
from sklearn.utils import shuffle

X_train, y_train = shuffle(X_train, y_train)

validation_offset = int((VALIDATION_RATIO) * len(X_train))

# Use initial part of data for training
X_train = X_train[validation_offset:]
y_train = y_train[validation_offset:]

# Generate a validation set. using later part of split
X_validation = X_train[:validation_offset]
y_validation = y_train[:validation_offset]

# Update image shape parameters (since we are operating on b/w images)
image_shape = X_train[0].shape

print("Data split: ")
print("------------")
print("   Training set   = %d images"%(len(X_train)))
print("   Validation set = %d images"%(len(X_validation)))
print("   Test set       = %d images"%(len(X_test)))

#Additional data can be generated using augmentation
#1. Rotate the images by an angle
#2. Translate images (shifting in x, y or both axes)
# Code available needs to be integrated
Data split: 
------------
   Training set   = 31368 images
   Validation set = 7841 images
   Test set       = 12630 images
In [8]:
#TODO: Augment input data using augment_data function defined above

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

Network Architecture:

Model:

Current implementation uses LeNet network with following layers and connections

Layers and their details:

  1. Convolution (5 x 5, 32 kernels, stride 1)
  2. RELU
  3. Pool (Max pooling with 2 x 2 kernels and stride of 2)
  4. Convolution (5 x 5, 16 kernels, stride 1)
  5. RELU
  6. Pool (Max pooling with 2 x 2 kernels and stride of 2)
  7. FC (400 neurons input 120 neurons output)
  8. RELU
  9. FC (120 neurons input 84 neurons output)
  10. RELU
  11. FC (84 neurons input number of classes as output size)

Input and Output:

Two placeholders have been defined to

  1. Accept input data
  2. Communicate the output data
In [9]:
# Network definition
from tensorflow.contrib.layers import flatten

def LeNet(x):    
    # Hyperparameters
    mu = 0
    sigma = 0.1
    
    # SOLUTION: Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
    conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), mean = mu, stddev = sigma))
    conv1_b = tf.Variable(tf.zeros(6))
    conv1   = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b

    # SOLUTION: Activation.
    conv1 = tf.nn.relu(conv1)

    # SOLUTION: Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # SOLUTION: Layer 2: Convolutional. Output = 10x10x16.
    conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
    conv2_b = tf.Variable(tf.zeros(16))
    conv2   = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    
    # SOLUTION: Activation.
    conv2 = tf.nn.relu(conv2)

    # SOLUTION: Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # SOLUTION: Flatten. Input = 5x5x16. Output = 400.
    fc0   = flatten(conv2)
    
    # SOLUTION: Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
    fc1_b = tf.Variable(tf.zeros(120))
    fc1   = tf.matmul(fc0, fc1_W) + fc1_b
    
    # SOLUTION: Activation.
    fc1    = tf.nn.relu(fc1)

    # SOLUTION: Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W  = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
    fc2_b  = tf.Variable(tf.zeros(84))
    fc2    = tf.matmul(fc1, fc2_W) + fc2_b
    
    # SOLUTION: Activation.
    fc2    = tf.nn.relu(fc2)

    # SOLUTION: Layer 5: Fully Connected. Input = 84. Output = class_count.
    fc3_W  = tf.Variable(tf.truncated_normal(shape=(84, n_classes), mean = mu, stddev = sigma))
    fc3_b  = tf.Variable(tf.zeros(n_classes))
    logits = tf.matmul(fc2, fc3_W) + fc3_b
    
    return logits
In [10]:
# Define input and output placeholders to interface with tensorflow
import tensorflow as tf

x = tf.placeholder(tf.float32, (None, image_shape[0], image_shape[1], image_shape[2]))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, n_classes)

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

While training the network, validation loss and training batch loss were considered before updating the hyperparameters

Optimizer: AdamOptimizer

  • AdamOptimizer is amoung the best available optimizers to train the network in terms of convergence, stability and ease of use

Training batch size: 128

  • Since a GPU was available for training, even higher batch size would be possible to accomodate. Found batch size of 128 fit to operate without significantly affecting other processing relying on GPU

Epochs executed: 25

  • Training performed on restored weights (when available) loaded from previously stored execution. The saved weights were deleted while experimenting with differet architecture.
  • Since the learning rate is good enough, restoring the model only serves to have a good starting point. With this approach, validation accuracy near 99% could be achieved without overfitting on input database. Performance on test data was observed to be good after using restored weights.

INTENTIONALLY NOT USING RESTORED WEIGHTS TO DEMONSTRATE top_k PREDICTION SIGNIFICANCE

Learning rate: 0.001

  • Experimented with learning rates ranging from 0.1 to 0.00001 and found the network to perform well at 0.001

  • Here grayscale images were used to train the network but color images could be used too

Validation and visualization

  • Apart from the steps mentioned above, visualizations were performed on the data found to be estimated wrong. All the validation samples for which outputs were incorrect were plotted and analyzed to get an idea of what the network is lacking to understand and which parameter to tune accordingly.
In [11]:
# Define optimizer parameters
rate = 0.001

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
In [12]:
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def get_labels_from_probabilities(probabilities):
    predicted_labels = []
    for probability_list in probabilities:
        probability_list = list(probability_list)
        predicted_labels.append(probability_list.index(max(probability_list)))
    return predicted_labels

def display_comparison(img1, img2, label1, label2):
    img1 = np.reshape(img1, (image_shape[0], image_shape[1]))
    img2 = np.reshape(img2, (image_shape[0], image_shape[1]))
    plt.figure()

    axis = plt.subplot(1, 2, 1)
    axis.xaxis.set_visible(False)
    axis.yaxis.set_visible(False)
    axis.title.set_text(label1)
    plt.imshow(img1, cmap='gray')

    axis = plt.subplot(1, 2, 2)
    axis.xaxis.set_visible(False)
    axis.yaxis.set_visible(False)
    axis.title.set_text(label2)
    plt.imshow(img2, cmap='gray')
    
    plt.tight_layout()
    plt.show()

def display_incorrect_predictions(predictions, x, y):
    p_labels = get_labels_from_predictions(predictions)
    for i in range(len(y)):
        if(p_labels[i] != y[i]):
            label1 = "Input : " + sample_labels[y[i]] + "                  " 
            label2 = "Predicted:" + sample_labels[p_labels[i]]
            display_comparison(x[i], sample_images[p_labels[i]], label1, label2)

def evaluate(X_data, y_data, display_incorrect=False):
    num_examples = len(X_data)
    total_accuracy = 0

    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        predictions = sess.run(logits, feed_dict={x: batch_x, y: batch_y})
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
        if display_incorrect:
            display_incorrect_predictions(predictions, batch_x, batch_y)
    return total_accuracy / num_examples, predictions

def predict(X_data):
    num_examples = len(X_data)
    total_accuracy = 0

    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x = X_data[offset:offset+BATCH_SIZE]
        probabilities = sess.run(logits, feed_dict={x: batch_x})
        p_labels = get_labels_from_probabilities(probabilities)
    return p_labels, probabilities
In [21]:
# Training the network
EPOCHS = 25
BATCH_SIZE = 128

import os
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    files = os.listdir('.')
    found = False
    for file in files:
        if file == 'checkpoint':
            found = True
    if found:
        print ("Checkpoint located, restoring model")
        saver.restore(sess, tf.train.latest_checkpoint('.'))
    else:
        print("Could not locate checkpoint file. Creating a new model")
    num_examples = len(X_train)
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        X_train, y_train = shuffle(X_train, y_train)
        for offset in range(0, num_examples, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = X_train[offset:end], y_train[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})
            
        validation_accuracy, predictions = evaluate(X_validation, y_validation)
        print('EPOCH %d/%d '%(i+1, EPOCHS) +
              'Validation Accuracy = %.5f '%validation_accuracy)
    print()
        
    saver.save(sess, 'project2-net')
    print("Model saved")
Could not locate checkpoint file. Creating a new model
Training...

EPOCH 1/25 Validation Accuracy = 0.70960 
EPOCH 2/25 Validation Accuracy = 0.86099 
EPOCH 3/25 Validation Accuracy = 0.88165 
EPOCH 4/25 Validation Accuracy = 0.93100 
EPOCH 5/25 Validation Accuracy = 0.95013 
EPOCH 6/25 Validation Accuracy = 0.96582 
EPOCH 7/25 Validation Accuracy = 0.96238 
EPOCH 8/25 Validation Accuracy = 0.97054 
EPOCH 9/25 Validation Accuracy = 0.97130 
EPOCH 10/25 Validation Accuracy = 0.98355 
EPOCH 11/25 Validation Accuracy = 0.98215 
EPOCH 12/25 Validation Accuracy = 0.98431 
EPOCH 13/25 Validation Accuracy = 0.97092 
EPOCH 14/25 Validation Accuracy = 0.98725 
EPOCH 15/25 Validation Accuracy = 0.98610 
EPOCH 16/25 Validation Accuracy = 0.99286 
EPOCH 17/25 Validation Accuracy = 0.98674 
EPOCH 18/25 Validation Accuracy = 0.99222 
EPOCH 19/25 Validation Accuracy = 0.99273 
EPOCH 20/25 Validation Accuracy = 0.99056 
EPOCH 21/25 Validation Accuracy = 0.98776 
EPOCH 22/25 Validation Accuracy = 0.99566 
EPOCH 23/25 Validation Accuracy = 0.99605 
EPOCH 24/25 Validation Accuracy = 0.99554 
EPOCH 25/25 Validation Accuracy = 0.99286 

Model saved
In [22]:
DEBUG = 0
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))

    compare_incorrect = False
    if DEBUG:
        compare_incorrect = True
    test_accuracy, predictions = evaluate(X_test, y_test, display_incorrect=compare_incorrect)
    predicted_labels = get_labels_from_probabilities(predictions)
    print("Test Accuracy = {:.3f}".format(test_accuracy))
Test Accuracy = 0.896

Question 5

What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.

Answer:

  • This is a multi class classification problem. The solution demanded classification of 43 classes using 32 x 32 images.
  • LeNet, AlexNet would have been better choices to operate on low resolution images with lesser samples
  • Started with LeNet with an aim to understand and modify later if the performance does not meet requirements
  • Other options available: ResNet, GoogLeNet (Usually used on high resolution images)

Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Answer

The network was run through two sets of images

  1. Images taken from internet http://www.gettingaroundgermany.info/zeichen.shtml#reg resembling data from training set
  2. Images taken form the internet where traffic signs did not belong to classes the network was trained for

For 1. the network accuracy was observed to be 80%. Following code allows loading and visualizing the input data

In [29]:
# Load external test images  which resemble the ones from database and plot
import os
import matplotlib.image as mpimg

DEBUG = 1

# Test Generator: get_next_test_data()
def get_next_test_data(foldername, files):
    for i,file in enumerate(files):
        img = mpimg.imread(foldername + "/" + file)
        if DEBUG:
            display(img, file)
        yield img
    plt.show()
    return None

def load_test_data(foldername):
    files = os.listdir(foldername)
    images = []
    for image in get_next_test_data(foldername, files):
        images.append(image)
    return images, files

TEST_DIR = './test-data/test-dataset-de'
ext_images, ext_files = load_test_data(TEST_DIR)

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.

Answer: All the candidates mentioned above have been considered to analyze the network performance. Following qualities of the images could make classification difficult:

Assuming generic images taken from anywhere:

  1. They may not necessarily be in the training set
  2. Unwanted data would result to incorrect inference. Input images should be cropped to contain only the traffic sign information ### For test images provided in the dataset:
  3. Noise profile may affect inference if the network is not trained well
  4. Valid data which do not match with training data. This has to do with network lacking generalization. Overfitting is one of the reasons network fail to generate correct inferences on test data inspite of being trained well
  5. Non-affine projective transformations could make it difficult to classify signs. This means it would be difficult to classify a traffic sign image taken from a camera below the sign compared to the one taken from a camer in front of the sign
In [25]:
# Run predictions on loaded test data

ext_test_data = preprocess_list(ext_images)
DEBUG = 1
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))

    predictions, predict_prob = predict(ext_test_data)
    print (predictions)
    if DEBUG:
        for i, p_l in enumerate(predictions): 
            label1 = "I/P file: " + ext_files[i]
            label2 = "Predicted: " + str(sample_labels[p_l])
            display_comparison(ext_test_data[i], sample_images[p_l], label1, label2)
[14, 38, 31, 2, 11, 25, 22, 39, 36, 37]

Question 7

Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.

NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.

Answer:

The network does not perform as well on new images (from internet belonging to the dataset class).

  1. Performance of the network is probably not as good as the new data has more variation which the network is not trained with.
  2. Another reason is probably because the network has not learned to generalize the signs well
  3. Camera intrinsic parameters would modulate data and that could lead to poor performace as well
  4. Training data needs to be trimmed to contain region of interest (currently not handled). Once the code to handle bounding boxes is enabled the situation should ideally be better than now.
In [26]:
correct_labels = [14, 38, 31, 3, 27, 25, 29, 39, 36, 37]

incorrect_detections = 0
for i, pred in enumerate(predictions):
    if pred != correct_labels[i]:
        incorrect_detections += 1
        
print ("External Test data incorrect Detections : %d/%d"%(incorrect_detections, len(predictions)))
print ("Accuracy : %.2f%%"%(100.0 - 100.0 * incorrect_detections/len(predictions)))
External Test data incorrect Detections : 3/10
Accuracy : 70.00%

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

tf.nn.top_k will return the values and indices (class ids) of the top k predictions. So if k=3, for each sign, it'll return the 3 largest probabilities (out of a possible 43) and the correspoding class ids.

Take this numpy array as an example:

# (5, 6) array
a = np.array([[ 0.24879643,  0.07032244,  0.12641572,  0.34763842,  0.07893497,
         0.12789202],
       [ 0.28086119,  0.27569815,  0.08594638,  0.0178669 ,  0.18063401,
         0.15899337],
       [ 0.26076848,  0.23664738,  0.08020603,  0.07001922,  0.1134371 ,
         0.23892179],
       [ 0.11943333,  0.29198961,  0.02605103,  0.26234032,  0.1351348 ,
         0.16505091],
       [ 0.09561176,  0.34396535,  0.0643941 ,  0.16240774,  0.24206137,
         0.09155967]])

Running it through sess.run(tf.nn.top_k(tf.constant(a), k=3)) produces:

TopKV2(values=array([[ 0.34763842,  0.24879643,  0.12789202],
       [ 0.28086119,  0.27569815,  0.18063401],
       [ 0.26076848,  0.23892179,  0.23664738],
       [ 0.29198961,  0.26234032,  0.16505091],
       [ 0.34396535,  0.24206137,  0.16240774]]), indices=array([[3, 0, 5],
       [0, 1, 4],
       [0, 5, 1],
       [1, 3, 5],
       [1, 4, 3]], dtype=int32))

Looking just at the first row we get [ 0.34763842, 0.24879643, 0.12789202], you can confirm these are the 3 largest probabilities in a. You'll also notice [3, 0, 5] are the corresponding indices.

Answer:

  1. If only top probability considered for infering a traffic sign, the accuracy of the network was observed to be low (around 70%)
  2. With only top two probabilities included the network accurary is as high as 90%
  3. The network however does a good job at keeping the right prediction in the top 3 probabilities (100% accuracy)

Conclusion: We could consider the top 3 probabilites while removing the noise effect and building up the confidence if the network output is not stable due to changing lighting conditions or lack of training data or any other reason leading to less probable but consistent label inferencing from the network

In [28]:
correct_prediction_ext = tf.nn.in_top_k(predict_prob, correct_labels, k=3)
accuracy_op_ext = tf.reduce_mean(100*tf.cast(correct_prediction_ext, tf.int32))

with tf.Session() as sess:
    acc, valid_pred = sess.run([accuracy_op_ext, correct_prediction_ext])

print("Prediction Accuracy: %d%%" % acc)
Prediction Accuracy: 100%

Appendix: Running random images grabbed from the internet through the network

In [32]:
TEST_DIR = './test-data/test-dataset-random'
ext_images, ext_files = load_test_data(TEST_DIR)
In [33]:
ext_test_data = preprocess_list(ext_images)
DEBUG = 1
with tf.Session() as sess:
    saver.restore(sess, tf.train.latest_checkpoint('.'))

    predictions, predict_prob = predict(ext_test_data)
    print (predictions)
    if DEBUG:
        for i, p_l in enumerate(predictions): 
            label1 = "I/P file: " + ext_files[i]
            label2 = "Predicted: " + str(sample_labels[p_l])
            display_comparison(ext_test_data[i], sample_images[p_l], label1, label2)
[31, 35, 1, 13, 9, 40, 40, 37, 18, 18, 40]

Some interesting Observations:

  1. The network does not behave quite well with random images taken from the internet
  2. Crocodile threat appears to be a general caution (img: crocs.png)
  3. In case there is roadwork ahead, the network in quest of adventure insists we go ahead on the same road (img: roadwork_ahead.png)
  4. In case you hit the love limit, either go ahead or left. DO NOT TAKE A RIGHT TURN (img: love_limit.jpg)
  5. Interesting to see that the network recommends users not to pass when Zombies are expected ahead :D (img: zombies_ahead.jpg)

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.